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^ Adju st a b le block size coherent caches 
Czarek Dubnicki, Thomas J. LeBlanc 

April 1992 ACM SIGARCH Computer Architecture News , Proceedings of the 19th annual 
international symposium on Computer architecture, Volume 20 issue 2 

Full text available: ' |^pdf(l.2 4 MB) Additional Information: full citation , abstract , references , citing s, index temis 

Several studies have shown that the performance of coherent caches depends on the relationship 
between the granularity of sharing and locality exhibited by the program and the cache block size. 
Large cache blocks exploit processor and spatial locality, but may cause unnecessary cache 
invalidations due to false sharing. Small cache blocks can reduce the number of cache invalidations, 
but increase the nuber of bus or network transactions required to load data into the cache. In this 
paper we ... 



^ Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory 
multiprocessors 

Milo M. K. Martin, Pacia J. Harper, Daniel J. Sorin, Mark D. Hill, David A. Wood 
May 2003 ACM SIGARCH Computer Architecture News , Proceedings of the 30th annual 

international symposium on Computer architecture. Volume 31 Issue 2 
Full text available: ^g.p.df(22QJ6.KB^ Additional Information: fuiLcilajEiQn, abslracl. references. 

Destination-set prediction can improve the latency/bandwidth tradeoff in shared-memory 
multiprocessors. The destination set is the collection of processors that receive a particular coherence 
request. Snooping protocols send requests to the maximal destination set (i.e., all processors), 
reducing latency for cache-to-cache misses at the expense of increased traffic. Directory protocols 
send requests to the minimal destination set, reducing bandwidth at the expense of an indirection 
through the d ... 




Piranha: a scalable architecture based on single-chip multiprocessing 

Luiz Andre Barroso, Kourosh Gharachorloo, Robert McNamara, Andreas Nowatzyk, Shaz Qadeer, Barton 
Sano, Scott Smith, Robert Stets, Ben Verghese 

May 2000 ACM SIGARCH Computer Architecture News , Proceedings of the 27th annual 
international symposium on Computer architecture, Volume 28 issue 2 

Full text available: ^.p.cj.f(1.9.i...lQ, KB) Additional Information: fu!L.cltati.Qn. abstract, references, citings, ind.ax.leims 



The microprocessor industry is currently struggling with higher development costs and longer design 
times that arise from exceedingly complex processors that are pushing the limits of instruction-level 
parallelism. Meanwhile, such designs are especially ill suited for important commercial applications, 
such as on-line transaction processing (OLTP), which suffer from large memory stall times and exhibit 
litde instruction-level parallelism. Given that commercial applications constitute by fa ... 



^ Memory coheren c e in sha red virt u a l m e m ory syst e m s 
Kai Li, Paul Hudak 

Novemberi989 ACM Transactions on Computer Systems (TOCS), Volume 7 issue 4 

Full text available: ^^,B( Jff2.7l MB) Additional Information: full citation , abstract , references , citings , index tenms . review 

The memory coherence problem in designing and implementing a shared virtual memory on loosely 
coupled multiprocessors is studied in depth. Two classes of algorithms, centralized and distributed, 
for solving the problem are presented. A prototype shared virtual memory on an Apollo ring based on 
these algorithms has been implemented. Both theoretical and practical results show that the memory 
coherence problem can indeed be solved efficiently on a loosely coupled multiprocessor. 




^ A d i st r i b uted sh ared mem or y mu lti processor A SU R A: mem o ry an d cache architecture 
S. Mori, H. Saito, M. Goshima, S. fomita, M. Yanagihara, f. Tanaka, D. Fraser, K. Joe, H. Nitta 
December 1993 Proceedings of the 1993 ACM/IEEE conference on Supercomputing 

Full text available; 4^p_df{1.17 MB) Additional Information: full citation, references, citings, in?tfix.lerms 



® Synchroniza t ion w i t h multipro ce ssor caches 
Joonwon Lee, Umakishore Ramachandran 

May 1990 ACM SIGARCH Computer Architecture News , Proceedings of the 17th annual 
international symposium on Computer Architecture, Volume 18 Issue 3 

Full text available: ^^Pdff1.18 MB) Additional Infomnation: full citation , abstract , references , citings, index temis 

Introducing private caches in bus-based shared memory multiprocessors leads to the cache 
consistency problem since there may be multiple copies of shared data. However, the ability to snoop 
on the bus coupled with the fast broadcast capability allows the design of special hardware support 
for synchronization. We present a new lock-based cache scheme which incorporates synchronization 
into the cache coherency mechanism. With this scheme high-level synchronization primitives as well 
as low-le ... 

^ Dejayed consistency an d it s ef f e cts on the miss rate of parallel prog rams 
Michel Dubois, Jin Chin Wang, Luiz A. Barroso, Kangwoo Lee, Yung-Syau Chen 
August 1991 Proceedings of the 1991 ACM/IEEE conference on Supercomputing 

Full text available: ^^pdffl.01 MP) AddiUonal Information: fu!l citation , references , citing s, index terms 



Cache memory performanc e in a uni x env ir ome nt 

Cedell Alexander, William Keshlear, Furrokh Cooper, Faye Briggs 

June 1986 ACM SIGARCH Computer Architecture News, Volume 14 Issue 3 

Full text available: ^lj.p.dft2..1 0..M8) Additional Information: fu.l|..cilci.ti.Qn, Gitinas., in.de.)Lterm5 



® An economical solut ion t o the c ac h e coherence problem 
James Archibald, Jean Loup Baer 

January 1984 ACM SIGARCH Computer Architecture News , Proceedings of the 11th annual 
international symposium on Computer architecture, Volume 12 Issue 3 

Full text available: ^Q.Rdf[Z28..23. Kg) Additional Information: fulLcitatjon, abstract, referfincg^., citings, iadex termst 

In this paper we review and qualitatively evaluate schemes to maintain cache coherence in tightly- 
coupled multiprocessor systems. This leads us to propose a more economical (hardware-wise), 
expandable and modular variation of the "global directory" approach. Protocols for this solution are 
described. Performance evaluation studies indicate the limits (number of processors, level of sharing) 
within which this approach is viable. 

^° Cache coherence protocols: evaluation using a multiprocessor simulation model 
James Archibald, Jean-Loup Baer 

September 1986 ACM Transactions on Computer Systems (TOCS), Volume 4 Issue 4 

Full text available: ^^pdf(1,79 MD) Additional Information: fu.!.l..cits.tiQn. abstract, referenQes., citings, |nd.e)?„te.!Tns. review 

Using simulation, we examine the efficiency of several distributed, hardware-based solutions to the 
cache coherence problem in shared-bus multiprocessors. For each of the approaches, the associated 
protocol is outlined. The simulation model is described, and results from that model are presented. 
The magnitude of the potential performance difference between the various approaches indicates that 
the choice of coherence solution is very important in the design of an efficient shared-bus multi ... 

""^ Cache coher en c e in s yste ms with parall el com mun ica ti o n c h a n n e ls & m an y processors 
John C. Willis, Arthur C. Sanderson, Charles R. Hill 

Novemberi990 Proceedings of the 1990 ACM/IEEE conference on Supercomputing 

Full text available: ^^.p_df{868,i5.9. KB) Additional Information: fui|..ci.t.atiQn. jEifeMract, references 

This paper describes and analyzes two algorithms for maintaining cache coherence in multiprocessor 
systems with parallel communication channels and many processors. A distributed link-list relates all 
cache frames representing the same main memory block. Messages traverse the list to maintain list 
integrity, exclusive ownership, and consistent values. Memory access semantics are equivalent to a 
shared memory system without caches. Reference latency, efficiency of memory use, and hardware 
complex ... 
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MultNevei shared caching techniq ues for scala bility in VM P-M /C 
D. R. Cheriton, H. A. Goosen, P. D. Boyle 

April 1989 ACM SIGARCH Computer Architecture News , Proceedings of the 16th annual 
international symposium on Computer architecture. Volume 17 Issue 3 

Full text available: ^^.R.df(.i27 .jVIB) Additional Information: MLGilattQQ, ab_S.tra.ct, teferen_C&&, ci tings , index terms 

The problem of building a scalable shared memory multiprocessor can be reduced to that of building 
a scalable memory hierarchy, assuming interprocessor communication is handled by the memory 
system. In this paper, we describe the VMP-MC design, a distributed parallel multi-computer based 
on the VMP multiprocessor design, that is intended to provide a set of building blocks for configuring 
machines from one to several thousand processors. VMP-MC uses a memory hierarchy based on 
shared caches ... 



Options for dynamic address translation in COMAs 
Xiaogang Qiu, Michel Dubois 

April 1998 ACM SIGARCH Computer Architecture News , Proceedings of the 25th annual 
international symposium on Computer architecture. Volume 26 Issue 3 

Full text available: r^j] 

•g.p.df(1..3Z..MIB}.Wl'^..P.ubllshej: Additional Information: fg.!Lgi.tetj.o.n. abstract. re.ferQ.n.g.es.. Qitings., in.de)i;.te.rTn.s 

Site 

In modern processors, the dynamic translation of virtual addresses to support virtual memory is done 
before or in parallel with the first-level cache access. As processor technology improves at a rapid 
pace and the working sets of new applications grow insatiably the latency and bandwidth demands on 
the TLB (Translation Lookaside Buffer) are getting more and more difficult to meet. The situation is 
worse in multiprocessor systems, which run larger applications and are plagued by the TLB 
consiste ... 



A cla ss of com patible cache consistency protocols and their s upport by the IEEE futurebys 
p. Sweazey, A. J. Smith 

June 1986 ACi^ SIGARCH Computer Architecture News , Proceedings of the 13th annual 
international symposium on Computer architecture. Volume 14 Issue 2 

Full text available: ^j.p.df(.1.0.5.MB) Additional Infonnation: fyHj^Jtatign, atJStraat. Eefer.en.ces. citings., in.d.e.>;.t.ejrrr!i.S 

Standardization of a high performance blackplane bus, so that it can accommodate boards developed 
by different vendors, implies the need for a standardized cache consistency protocol. In this paper we 
define a class of compatible consistency protocols supported by the current IEEE Futurebus design. 
We refer to this class as the MOESI class of protocols; the term "MOESI" is derived from the names of 
the states. This class of protocols has the property that any system component ca ... 



Verification of an Industrial CC-NUM A Serv er 
Rajarshi Mukherjee, Yozo Nakayama, Toshiya Mima 

January 2002 Proceedings of the 2002 conference on Asia South Pacific design automation/ VLSI 
Design 

Full text available: 

PdfU4.1.,a6..KB)..^ Additional Infomnation: fuH citation , abstract 

F_ub[(5he.r..Sl.te 

Directed tes program-based verification or formal verification methods are usually quite ineffective on 
large cache-coherent, non-uniform memory access (CC-NUMA) multi-processors because of the size 
and complexity of the design and the complexity of the cache-coherence protocol. A controllable 
biased/constrained random stimuli generator coupled with an error detection mechanism using 
scoreboards and feedback with coverage analysis tools is a promising alternative methodology. We 
applied this met ... 



Hive: fault conta inm e nt fo r s ha r ed-m emo r y multipro c e ss ors 
J. Chapin, M. Rosenblum, S. Devine, T. Lahiri, D. Teodosiu, A. Gupta 

December 1995 ACM SIGOPS Operating Systems Review , Proceedings of the fifteenth ACM 
symposium on Operating systems principles. Volume 29 issue 5 



Full text available: 




Additional Information: full citation , references , citings , index terms 



Owner pred ic tion f or acceler a ting cache- to -cache tra n s fe r m isses in a cc - N UM A architecture 
Manuel E, Acacio, Jose Gonzalez, Jose M. Garcia, Jose Duato 

November 2002 Proceedings of the 2002 ACM/IEEE conference on Supercomputing 

Full text available: "g p.dfCiaQ^.5.7„KB) AddiUonal Information: MLeltalQa, abstract, rgferenQes., lnd.§x„t^irir).S. 

Cache misses for which data must be obtained from a remote cache (cache-to-cache transfer misses) 



http://portal.acm.org/results.cfm?coll=ACM&dl=ACM&CFID=227461 18&CT 1269 



account for an important fraction of the total miss rate. Unfortunately, cc-NUMA designs put the 
access to the directory information into the critical path of 3-hop misses, which significantly penalizes 
them compared to SMP designs. This work studies the use of owner prediction as a means of 
providing cc-NUMA multiprocessors with a more efficient support for cache-to-cache transfer misses. 
Our propo ... 



A cache coherence approach for large_nnultiprocessor systems 
J. K. Archibald 

June 1988 Proceedings of the 2nd international conference on Supercomputing 

Full text available: ^][pdfn.05 MB) Additional Information: full citation , abstract , references , citings , index terms 

This paper explores the architecture of high-performance large scale multiprocessors using private 
caches for each processor. The caches reduce the average memory access time, but they also result 
in the well known cache coherence problem. Multiple copies of each memory location are allowed to 
exist but they must be kept consistent with each other. In this paper, we present a solution to the 
cache coherence problem specifically for shared bus multiprocessors that adapts dyn ... 



Tok en c oheren ce: decoupling pe r formance and correctness 
Mllo M. K. Martin, Mark D. Hill, David A. Wood 

May 2003 ACM SIGARCH Computer Architecture News , Proceedings of the 30th annual 

international symposium on Computer architecture. Volume 31 Issue 2 
Full text available: gpdf (269,08 KB) Additional Information: fulLcitatlM. abatmct. [eferances 

Many future shared-memory multiprocessor servers will both target commercial workloads and use 
highly-integrated "glueless" designs. Implementing low-latency cache coherence in these systems is 
difficult, because traditional approaches either add indirection for common cache-to-cache misses 
(directory protocols) or require a totally-ordered interconnect (traditional snooping protocols). 
Unfortunately, totally-ordered interconnects are difficult to implement in glueless designs. An ideal 
coherenc ... 



Multicast snooping: a new coherence method using a multicast address network 
E. Ender Bilir, Ross M. Dickson, Ying Hu, Manoj Piakal, Daniel J. Sorin, Mark D. Hill, David A. Wood 
May 1999 ACM SIGARCH Computer Architecture News , Proceedings of tlie 26th annual 
international symposium on Computer architecture, Volume 27 issue 2 

Full text available: r^j) 

P.df(.9ai2..Kii)..^ P„u.b!ishe^^ Additional Information: JuLcitation. gijstmgt. references., citinas, iade.)4..tenm.s 

Site 

This paper proposes a new coherence nnethod called "multicast snooping" that dynamically adapts 
between broadcast snooping and a directory protocol. Multicast snooping is unique because 
processors predict which caches should snoop each coherence transaction by specifying a multicast 
"mas/f." Transactions are delivered with an ordered multicast network, such as an Isotach network, 
which eliminates the need for acknowledgment messages. Processors handle transactions as they 
would with a snoop ... 
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^ Adjustable block size coherent caches 
Czarek Dubnicki, Thonnas J. LeBlanc 

April 1992 ACM SIGARCH Computer Architecture News , Proceedings of the 19th annual 

international symposium on Computer architecture, Volunrie 20 Issue 2 
Full text available: ^^.p.df(li24„MSl Additional Information: fuILcMtion, abstract. r_e.femn.ce,$., .cMags, iadex lemns 

Several studies have shown that the performance of coherent caches depends on the relationship 
between the granularity of sharing and locality exhibited by the program and the cache block size. 
Large cache blocks exploit processor and spatial locality, but may cause unnecessary cache 
invalidations due to false sharing. Small cache blocks can reduce the number of cache invalidations, 
but increase the nuber of bus or network transactions required to load data into the cache. In this 
paper we ... 



2 Cache memory performance in a unIx enviroment 

Cedell Alexander, William Keshlear, Furrokh Cooper, Faye Briggs 

June 1986 ACM SIGARCH Computer Architecture News, Volume 14 issue 3 

Full text available: ^^pdf (2.10 MB) Additional Information: MLcitation, citings, index, terrns 
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Memory coherence in sh a red virtual memory systems 
Kai Li, Paul Hudak 

Novemberl989 ACM Transactions on Computer Systems (TOCS), Volume 7 Issue 4 

Full text available: ^|pcif(2.71 MB) Additional Infomnation: MLcltaMn. abstract, refereoces, citina$. iMexlerms, mvlew 

The memory coherence problem in designing and implementing a shared virtual memory on loosely 
coupled multiprocessors is studied in depth. Two classes of algorithms, centralized and distributed, 
for solving the problem are presented. A prototype shared virtual memory on an Apollo ring based on 
these algorithms has been implemented. Both theoretical and practical results show that the memory 
coherence problem can indeed be solved efficiently on a loosely coupled multiprocessor. 

Dela y ed consisten c y and its effe c t s on the m iss rate of p ar a llel pr o gr ams 
Michel Dubois, Jin Chin Wang, Luiz A. Barroso, Kangwoo Lee, Yung-Syau Chen 
August 1991 Proceedings of the 1991 ACM/IEEE conference on Supercomputing 

Full text available: ^| p.dfl.1..Cll.MB}. Additional Infomiation: full..Cita.ti0.tl. references, cMm^., index .terms 



^ A class of comp at ible ca ch e cons i s tenc y pro tocols an d their support b y t he I EEE f ut u rebus 
p. Sweazey, A. J. Smith 

June 1986 ACM SIGARCH Computer Architecture News , Proceedings of the 13th annual 
international symposium on Computer architecture. Volume 14 issue 2 

Full text available: ^|flilf M.05 MB) Additional Information: full citation , abstract , references, citings, index terms 

Standardization of a high performance blackplane bus, so that it can accommodate boards developed 
by different vendors, implies the need for a standardized cache consistency protocol. In this paper we 
define a class of compatible consistency protocols supported by the current IEEE Futurebus design. 
We refer to this class as the MOESI class of protocols; the term "MOESI" is derived from the names of 
the states. This class of protocols has the property that any system component ca ... 



® Multi-level sh ared c a c hing te c hnique s for sc al abilit y in VMP-M/C 
D. R. Cheriton, H. A. Goosen, P. D. Boyle 

April 1989 ACM SIGARCH Computer Architecture News , Proceedings of the 16th annual 
international symposium on Computer architecture, Volume 17 Issue 3 

Full text available: ^*^p.df(J.2_?_MB). Additional Information: fufi.Gitat!.g.n., afefitract, refe rences . .citiDS,§., index. terms 

The problem of building a scalable shared memory multiprocessor can be reduced to that of building 
a scalable memory hierarchy, assuming interprocessor communication is handled by the memory 
system. In this paper, we describe the VMP-MC design, a distributed parallel multi-computer based 
on the VMP multiprocessor design, that is intended to provide a set of building blocks for configuring 
machines from one to several thousand processors. VMP-MC uses a memory hierarchy based on 
shared caches ... 



^ Cache coherence protocols: evaluation using a multiprocessor simulation model 
James Archibald, Jean-Loup Baer 

September 1986 ACM Transactions on Computer Systems (TOCS), Volume 4 issue 4 

Full text available: ^^.p.df{.1.J..9.. MB) Additional Infomriation: fuIl.c.itsti.Qn, abstract, references, fiitin^^s. lnd„exJ.Qjnnni.§, ,rq.y.ie.w 

Using simulation, we examine the efficiency of several distributed, hardware-based solutions to the 
cache coherence problem in shared-bus multiprocessors. For each of the approaches, the associated 
protocol is outlined. The simulation model is described, and results from that model are presented. 
The magnitude of the potential performance difference between the various approaches indicates that 
the choice of coherence solution is very important in the design of an efficient shared-bus multi ... 



A cache consistency protocol for multiprocessors with multistage networks 
p. Stenstrom 

April 1989 ACM SIGARCH Computer Architecture News , Proceedings of the 16th annual 
international symposium on Computer architecture, Volume 17 issue 3 

Full text available: ^^.pdft920,9,l,KB.) Additional Information: fu|l„c.ita.tiQn, abstract, mfere.nc;.e§, gitings, iMeA.te.nms 

A hardware based cache consistency protocol for multiprocessors with multistage networks is 
proposed. Consistency traffic is restricted to the set of caches which have a copy of a shared block. 
State information Is distributed to the caches and the memory modules need not be consulted for 
consistency actions. The protocol provides two operating modes: distributed write and global read. 
Distribution of writes calls for efficient multicast methods. Communication cost for multicasti ... 



® A cache cohei:e_nce approach for large multiprocessor systems 
J. K. Archibard 

June 1988 Proceedings of the 2nd international conference on Supercomputing 

Full text available: ^p| pdf{1.05 MB) Additional Information; f.ui.Lci.ta.ti.o.n, ab.s.tc£!.ct. Letonc.e.s.. .citinas.. in.de.x..t.enm5 

This paper explores the architecture of high-performance large scale multiprocessors using private 
caches for each processor. The caches reduce the average memory access time, but they also result 
in the well known cache coherence problem. Multiple copies of each memory location are allowed to 
exist but they must be kept consistent with each other. In this paper, we present a solution to the 
cache coherence problem specifically for shared bus multiprocessors that adapts dyn ... 



T h e sun fireplan e s ys tem inte r connec t 
Alan Charlesworth 

November 2001 Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM) 

Full text available: ^HDdff224.87 KB) Additional Information: full citation , abstract , references , citings , index terms 

System interconnect is a key determiner of the cost, performance, and reliability of large cache- 
coherent, shared-memory multiprocessors. Interconnect Implementations have to accommodate ever 
greater numbers of ever faster processors. This paper describes the Sun^" Fireplane two-level cache- 
coherency protocol, and its use in the medium and large-sized UltraSPARC-III-based Sun Fire^" 
servers. 



Architectur e a nd design of Al p ha Server GS32Q 

Kourosh Gharachorloo, Madhu Sharma, Simon Steely, Stephen Van Doren 

Novennber 2000 Proceedings of the ninth international conference on Architectural support for 
programming languages and operating systems, Volume 28 , 34 Issue 5 , 5 

Full text available: ^.prffCl 1 3.,.9.l..Ke.) Additional Infomnation : fgll ..gija.tj.Qn , afestract, references, citings, in<;i.eJLt_erni.§, 

This paper describes the architecture and implementation of the AlphaServer GS320, a cache- 
coherent non-uniform memory access multiprocessor developed at Compaq. The AlphaServer GS320 
architecture is specifically targeted at medium-scale multiprocessing with 32 to 64 processors. Each 
node In the design consists of four Alpha 21264 processors, up to 32GB of coherent memory, and an 
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aggressive 10 subsystem. The current implementation supports up to 8 such nodes for a total of 32 
processors. While s ... 



A r chit e c ture and design of Alp haS erv er GS3 2 0 

Kourosh Gharachorloo, Madhu Sharma, Simon Steely, Stephen Van Doren 

November 2000 ACM SIGPLAN Notices, Volume 35 Issue 11 

Full text available: ^^pdf(1.67 MB) Additional Information: full citation , abstract , references , citings , index terms 

This paper describes the architecture and implementation of the AlphaServer GS320, a cache- 
coherent non-uniform memory access multiprocessor developed at Compaq. The AlphaServer GS320 
architecture is specifically targeted at medium-scale multiprocessing with 32 to 64 processors. Each 
node in the design consists of four Alpha 21264 processors, up to 32GB of coherent memory, and an 
aggressive 10 subsystem. The current implementation supports up to 8 such nodes for a total of 32 
processors. While s ... 



Implennenting a cac he con si ste ncy pr o t o c ol 

R. H. Katz, S. J. Eggers, D. A. Wood, C. L. Perkins, R. G. Sheldon 

June 1985 ACM SIGARCH Computer Architecture News , Proceedings of the 12th annual 
international symposium on Computer architecture, Volume 13 issue 3 

Full text available: pdff803.l1 KB) Additional Information: fy!.|..qitati.Qn. citings, index terms 



Keywords: ownership-based protocols, shared bus multicomprocessor cache consistency, single chip 
implementation, snooping caches 



Sensor databases: Cache-and-query for wide area sensor databases 
Amol Deshpande, Suman Nath, Phillip B. Gibbons, Srinlvasan Seshan 

June 2003 Proceedings of the 2003 ACM SIGMOD international conference on on Management 
of data 

Full text available: pdfC23Q,75.KB) Additional Infomnation: full. citation, at).g.tra(?.t. .r^fer&n.g.Qs, jnd.^ijjg'.rrns 



Webcams, microphones, pressure gauges and other sensors provide exciting new opportunities for 
querying and monitoring the physical world. In this paper we focus on querying wide area sensor 
databases, containing (XML) data derived from sensors spread over tens to thousands of miles. We 
present the first scalable system for executing XPATH queries on such databases. The system 
maintains the logical view of the data as a single XML document, while physically the data is 
fragmented across a ... 



Piranha: a scal a ble a r chitecture bas ed on single-chip nnultiprocessing 

Luiz Andre Barroso, Kourosh Gharachorloo, Robert McNamara, Andreas Nowatzyk, Shaz Qadeer, Barton 
Sano, Scott Smith, Robert Stets, Ben Verghese 

May 2000 ACM SIGARCH Computer Architecture News , Proceedings of the 27th annual 
international symposium on Computer architecture, Volume 28 Issue 2 

Full text available: ^^pdf(l91.10 KB) Additional Infomnation: full citation , abstract , references , citings , index terms 

The microprocessor industry is currently struggling with higher development costs and longer design 
times that arise from exceedingly complex processors that are pushing the limits of instruction-level 
parallelism. Meanwhile, such designs are especially ill suited for important commercial applications, 
such as on-line transaction processing (OLTP), which suffer from large memory stall times and exhibit 
little instruction-level parallelism. Given that commercial applications constitute by fa ... 



A d istributed s h are d nne m ory multipro ces s or ASURA: nnemo ry a nd cache a rch it e c t u re 
S. Mori, H. Saito, M. Goshima, S. Tomita, M. Yanagihara, T. Tanaka, D. Fraser, K. Joe, H. Nitta 
December 1993 Proceedings of the 1993 ACM/IEEE conference on Supercomputing 

Full text available; 4^ pdf(1.17 MB] Additional Infomiation: full citation , references, citings , index terms 



An empirical evaluation of two me m ory-effici en t di r e ct ory nnethods 
Brian W. O'Krafka, A. Richard Newton 

May 1990 ACM SIGARCH Computer Architecture News , Proceedings of the 17th annual 
international symposium on Computer Architecture, Volume 18 Issue 3 

Full text available: ^p pdfCLilMB) Additional Information: fu!l„cjtatio.n, abstract, ffifere.nc.es. CLlinQS. ind.ex..te.(io.s. 

This paper presents an empirical evaluation of two nnemory-efficient directory methods for 
maintaining coherent caches in large shared memory multiprocessors. Both directory methods are 
modifications of a scheme proposed by Censier and Feautrier [5] that does not reiy on a specific 
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interconnection network and can be readily distributed across interleaved main memory. The 
schemes considered here overcome the large amount of memory required for tags in the original 
scheme in two different ways ... 
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This paper presents the strategy used to verify the error logic In the Alpha 21364 microprocessor. 
Traditional pre-silicon strategies of focused testing or unit-level random testing yield limited results in 
finding complex bugs in the error handling logic of a microprocessor. This paper introduces a 
technique to simulate error conditions and their recovery in a global environment using random test 
stimulus closely approximating traffic found in a real system. A significant number of bugs ... 
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Communication latencies within critical sections constitute a major bottleneck in some classes of 
emerging parallel workloads. In this paper, we argue for the use of Inferentially Queued Locks (IQLs) 
[31], not just for efficient synchronization but also for reducing communication latencies, and we 
propose a novel mechanism, Speculative Push (SP), aimed at reducing these communication 
latencies. With IQLs, the processor infers the existence, and limits, of a critical section from the use 
of synch ... 
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Site 

In modern processors, the dynamic translation of virtual addresses to support virtual memory is done 
before or in parallel with the first-level cache access. As processor technology improves at a rapid 
pace and the working sets of new applications grow insatiably the latency and bandwidth demands on 
the TLB (Translation Lookaside Buffer) are getting more and more difficult to meet. The situation is 
worse in multiprocessor systems, which run larger applications and are plagued by the TLB 
consiste ... 
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